Representation of Morphosyntactic Units and Coordination Structures in the Turkish Dependency Treebank
نویسندگان
چکیده
This paper presents our preliminary conclusions as part of an ongoing effort to construct a new dependency representation framework for Turkish. We aim for this new framework to accommodate the highly agglutinative morphology of Turkish as well as to allow the annotation of unedited web data, and shape our decisions around these considerations. In this paper, we firstly describe a novel syntactic representation for morphosyntactic sub-word units (namely inflectional groups (IGs) in Turkish) which allows inter-IG relations to be discerned with perfect accuracy without having to hide lexical information. Secondly, we investigate alternative annotation schemes for coordination structures and present a better scheme (nearly 11% increase in recall scores) than the one in Turkish Treebank (Oflazer et al., 2003) for both parsing accuracies and compatibility for colloquial language.
منابع مشابه
تولید درخت بانک سازهای زبان فارسی به روش تبدیل خودکار
Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...
متن کاملStatistical Dependency Parsing for Turkish
This paper presents results from the first statistical dependency parser for Turkish. Turkish is a free-constituent order language with complex agglutinative inflectional and derivational morphology and presents interesting challenges for statistical parsing, as in general, dependency relations are between “portions” of words – called inflectional groups. We have explored statistical models tha...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملExploring Treebank Transformations in Dependency Parsing
This paper presents a set of experiments performed on parsing the Basque Dependency Treebank. We have concentrated on treebank transformations, maintaining the same basic parsing algorithm across the experiments. The experiments can be classified in two groups: 1) feature optimization, which is important mainly due to the fact that Basque is an agglutinative language, with a rich set of morphos...
متن کاملUniversal Dependencies for Turkish
The Universal Dependencies (UD) project was conceived after the substantial recent interest in unifying annotation schemes across languages. With its own annotation principles and abstract inventory for parts of speech, morphosyntactic features and dependency relations, UD aims to facilitate multilingual parser development, cross-lingual learning, and parsing research from a language typology p...
متن کامل